Machine learning-driven assessment of biochemical qualities in tomato and mandarin using RGB and hyperspectral sensors as nondestructive technologies

Estimation of fruit quality parameters are usually based on destructive techniques which are tedious, costly and unreliable when dealing with huge amounts of fruits. Alternatively, non–destructive techniques such as image processing and spectral reflectance would be useful in rapid detection of fruit quality parameters. This research study aimed to assess the potential of image processing, spectral reflectance indices (SRIs), and machine learning models such as decision tree (DT) and random forest (RF) to qualitatively estimate characteristics of mandarin and tomato fruits at different ripening stages. Quality parameters such as chlorophyll a (Chl a), chlorophyll b (Chl b), total soluble solids (TSS), titratable acidity (TA), TSS/TA, carotenoids (car), lycopene and firmness were measured. The results showed that Red-Blue-Green (RGB) indices and newly developed SRIs demonstrated high efficiency for quantifying different fruit properties. For example, the R2 of the relationships between all RGB indices (RGBI) and measured parameters varied between 0.62 and 0.96 for mandarin and varied between 0.29 and 0.90 for tomato. The RGBI such as visible atmospheric resistant index (VARI) and normalized red (Rn) presented the highest R2 = 0.96 with car of mandarin fruits. While excess red vegetation index (ExR) presented the highest R2 = 0.84 with car of tomato fruits. The SRIs such as RSI 710,600, and R730,650 showed the greatest R2 values with respect to Chl a (R2 = 0.80) for mandarin fruits while the GI had the greatest R2 with Chl a (R2 = 0.68) for tomato fruits. Combining RGB and SRIs with DT and RF models would be a robust strategy for estimating eight observed variables associated with reasonable accuracy. Regarding mandarin fruits, in the task of predicting Chl a, the DT-2HV model delivered exceptional results, registering an R2 of 0.993 with an RMSE of 0.149 for the training set, and an R2 of 0.991 with an RMSE of 0.114 for the validation set. As well as for tomato fruits, the DT-5HV model demonstrated exemplary performance in the Chl a prediction, achieving an R2 of 0.905 and an RMSE of 0.077 for the training dataset, and an R2 of 0.785 with an RMSE of 0.077 for the validation dataset. The overall outcomes showed that the RGB, newly SRIs as well as DT and RF based RGBI, and SRIs could be used to evaluate the measured parameters of mandarin and tomato fruits.


Introduction
One of the fastest-growing agribusiness industries in Egypt is the production of fruits and vegetables.Citrus and vegetable crops are well adapted to Egypt's temperate environment.The main fruit crop grown and produced in Egypt is citrus, which is followed by mangoes, grapes, olives, and bananas.Citrus fruits such as oranges (Citrus Aurantium) and mandarin (Citrus reticulate) are among Egypt top exports as in 2020 it was considered the world's top exporter of citrus.Additionally, Egypt exported nearly 2 million Mg of citrus [1].In Egypt, mandarins account for roughly 25% of the total citrus production.The total planted area is about 46036 ha with a total production of 1038753 Mg; an average productivity of 22.6 Mg/ha.Citrus juices extracts contain important antioxidants because they are an important source of phenolic compounds [2].Ahmed et al. [3] also revealed that citrus juice is a rich source of ascorbic acid, vitamins and antioxidants that are important for our health.
Tomato fruits (Solanum lycopersicum) are also one of the most important vegetable crops in Egypt, with 168000 hectares cultivated area and 8 million Mg production in 2021.According to FAO [1], Egypt is ranked sixth for exporting tomato fruits with an average exports amount and value of 57.9 thousand Mg and 32.2 million dollars, respectively.Tomato fruits are rich in lycopene, which plays an important role in reducing the incidence of various diseases such as cardiovascular disease, osteoporosis, heart disease and cancer.In addition, tomato fruits are rich in vitamins A and K, vitamin C and potassium [4][5][6].
Various biochemical properties including total chlorophyll, chlorophyll a and b, carotenoids, soluble solid content (SSC) can effectively be used as fruit quality or maturity/ripening parameters.The determination of fruit quality parameters mainly depends on destructive methods which is difficult to be accomplished when huge numbers of observations are needed.These methods are not reliable for fast changes in fruit quality parameters.According to Wanitchang et al. [7] common destructive methods for measuring fruit ripeness, such as pH, total soluble solids, TSS/TA, and chlorophyll content, result in fruit destruction, take much time, expensive and delay export.
Image processing and spectral reflectance measurements can be robust alternative techniques to conventional methods of assessing fruit quality attributes [8].Image processing technique can obtain fruit images and acquire their spatial data while spectral reflectance offers data about chemical and physical properties of fruits.Image processing and spectral reflectance can provide the collection of fruit images and spectral data simultaneously [9].
Previous studies investigated the feasibility of spectral reflectance measurements to estimate biochemical properties of fruit as diagnostic indicators of fruit quality.Passive remote detection sensors mainly depend on sunlight as illumination source, which enable hyperspectral data to be collected in the visible (VIS) and near infrared (NIR) regions of the electromagnetic spectrum [9][10][11].Elsayed et al. [9] revealed that the newly developed (NDVI-VARI)/(NDVI--VARI) index showed a remarkable correlation with chlorophyll t, chlorophyll a and chlorophyll b of mango with R 2 values of 0.71, 0.71 and 0.78, respectively.Salah et al. [12] used spectroscopic technique to predict the chemical properties of orange fruits at different growth stages showing that the NDIs and R 672 /R 550 had strong significant correlations with chlorophyll b and chlorophyll a (R 2 = 0.84 and 0.92, respectively) while PSI and R 672 /R 550 had the highest correlation with TSS.Borba et al. [13] quantified the quality properties of tomato fruits in a non-destructive manner using spectral reflectance.They concluded that total soluble solid content (TSS), titrated acidity (TA) and citric acid can be rapidly and efficiently estimated using spectroscopy measurements.
The assessment of fruit quality in crops using SRIs often yields inconsistent outcomes in diverse geographical and environmental conditions.Therefore, there is an ongoing need for refining SRIs and RGBI to enhance their effectiveness as a rapid and straightforward approach for accurately estimating fruit quality parameters.It is of utmost importance to ascertain the optimal algorithmic formulations for the computation of diverse fruit quality attributes, thereby enhancing the efficacy of remotely acquired data in the evaluation of fruit quality.Typically, prior studies have primarily concentrated their efforts on utilizing published SRIs for the assessment of various fruit quality attributes.As far as the authors are aware, only a limited number of inquiries have investigated into the simultaneous application of distinct techniques for fruit property assessment.The distinctive advantage of the present investigation lies in the methodology employed for selecting the most suitable SRIs for the evaluation of fruit quality parameters.In this regard, the utilization of correlogram maps stands out as a noteworthy approach, enhancing the study's capability to identify and employ the most effective SRIs in assessing fruit quality attributes.
Although SRIs offer a straightforward approach for the estimation of biochemical parameters, with the potential to enable the development of a portable and lightweight instrument for the rapid and cost-effective assessment and management of biochemical parameters on a significant scale, it is important to note that each SRI is constrained by a finite set of band combinations.The challenge lies in formulating robust SRIs for the assessment of fruit quality attributes amidst diverse and potentially perplexing conditions.These conditions encompass substantial variations in the dimensions of fruit components and their consequential impact on the saturation level of the quality parameters under scrutiny.As a subset of artificial intelligence (AI), machine learning (ML) has grown quickly in this environment.A huge amount of spectrum data can be used by ML to extract important information for accurate classification and self-prediction [14].Employing model-based techniques for the purpose of feature selection has the capacity to discern a subset of features characterized by substantial discriminative and predictive potential, as demonstrated by the research conducted by Beltra ´n et al. [15].This strategy has the potential to augment model performance by mitigating the issue of overfitting and removing extraneous features.Additionally, retaining the initial feature representation can contribute to enhanced interpretability, as highlighted by Guyon and Elisseeff [16].The significance of feature selection algorithms in the context of modeling and prediction has been steadily on the rise, as underscored by Schuize et al. [17].Numerous approaches have been investigated for the purpose of diminishing data dimensionality.These include Decision Trees (DT) and Random Forest (RF).In the RF model, an assessment of variable importance is carried out based on the methodology outlined by Strobl et al. in their seminal work [18].Glorfeld et al. (2019) introduced a back-propagation neural network metric aimed at discerning the most pivotal variables within a given context [19].Furthermore, the process of hyper-parameter selection wields substantial influence over the performance of ML models, yielding manifold advantages.For instance, it has the potential to augment the efficacy of ML algorithms [20], foster equity and replicability in the realm of scientific inquiries [21], and exert a direct influence on the training dynamics of algorithms, thereby assuming a pivotal role in the enhancement of predictive models [22].
In the context of this research study, the overarching objective was to assess the effectiveness of both RGBI and SRIs as non-destructive techniques for estimating the characteristics of mandarin and tomato fruits, as well as for detecting the quality parameters of these fruits at various stages of maturity.To achieve this, the study set out to accomplish the following specific goals: (i) Quantify the quality parameters of mandarin and tomato fruits at different stages of ripening; (ii) Evaluate the suitability of both conventional and newly developed SRIs for quantifying the quality parameters of mandarin and tomato fruits; and (iv) Assess the performance of DT and RF models, which are based on RGBI and SRIs, in predicting the quality parameters of mandarin and tomato fruits.

Plant material
The experiments were conducted on mandarin and tomato fruits in the Laboratory of the Faculty of Agriculture, Tanta University, Gharbia Governorate, Egypt (30˚47' 18.00"N and 305 9' 54.61"E).Samples of fruits were collected from a private farm in Gharbia Governorate, a different stages of ripening.The fruits were randomly selected and harvested manually.The experiments were conducted throughout the year 2021 to predict the quality attributes of mandarin and tomato fruits using RGB indices and SRIs and linking them to the biochemical properties of the fruits.Balady Mandarin (Citrus reticulata, Blanco) fruit specimens of the sevenyear-old trees were procured during three distinct phases of ripening: the mature stage, characterized by predominantly green coloration; the semi-ripening stage, exhibiting a combination of green and yellow hues; and the ripening stage, marked by a vivid orange hue, as visually represented in Fig 1 .Tomato fruits (Solanum lycopersicum, Alissa F1) were collected at four different ripening such as dark green, yellowish green, light red and dark red as shown in Fig 2 and they were used for laboratory analysis.

Chemical parameters 2.2.1. Chlorophyll a, Chlorophyll b, carotenoids and lycopene.
A spectrophotometer was used to measure the absorbance at certain wavelengths of 663, 645, 480 and 503 nm to determine the content of Chl a, Chl b, car [23] and lycopene [24] of crude extracts in mandarin and tomato fruits using the following equations:

PLOS ONE
Chlorophyll b mg=g tissue Total carotenoids ðmg=g fwÞ Lycopene ðμg=g fr:wt:Þ ¼ 3:121 � OD value at 503 nm � vol: of sample � Dilution factor Fresh weight of sample ðgÞ ð4Þ Where A = absorbance at specific wavelengths, W = fresh weigh, and V = final volume of chlorophyll extract.

Total soluble solids (TSS).
A handheld refractometer (Milwaukee, model MA871, Brookfield, WI, USA) was used to measure the TSS in juice extract from tomato and mandarin fruits and the data was expressed as Brix (%) according to Cheour et al. [25].

Titrated acidity (TA).
The titrated acidity of tomato and mandarin juice extracts as a percentage of anhydrous citric acid was measured by titrating a given volume of juice fruits known to 0.1 N NaOH standard using 1% phenolphthalein as an indicator by A.O.A.C. [26].

Maturity index (TSS/TA).
The TSS/TA ratio was calculated from the TSS values divided by the TA values.

Physical parameter
The firmness of each fruit mandarin and tomato was measured using a digital fruit hartester (IC-FR5120, China) with a 6 mm probe.

Image analysis
Environment for Visualization and Visualization 4.6 (ENVI 4.6) is the ideal software (ITT, Visual Information Solutions, Boulder, CO, USA) for visualizing, viewing and analyzing digital images of all types.In addition, a large number of ENVI wizards are available, covering almost all functions available in the interactive ENVI software.Each processing routine is an IDL operation or function and is used like any other IDL routine.The image analysis results take the average value of three different RGB bands.The fruits of the mandarin and tomato were photographed with a Nikon D5300 camera (Nikon Corporation, Tokyo, Japan), a 24.2 megapixels digital single-lens reflex (DSLR) camera with 18-55 mm lens.The camera was manually held and directed vertically downwards towards the mandarin and tomato fruits at a distance of 30 cm.The measurements were carried out under cloudy conditions to guarantee high image resolution.The flash of the camera was always kept off during measurements using IrfanView 4.37, the digital photographs were converted from JPEG to TIF file format.Statistical analysis was done using the SPSS 22 package to calculate selected RGBI based on the red (R), green (G), and blue (B) pixels (Table 1).

Spectral reflectance measurements
Following the acquisition of various samples of mandarin and tomato, representing distinct ripening stages, spectral data for each specimen were obtained utilizing a passive reflection sensor manufactured by HandySpec Field 1 (tec5, Oberursel, Germany).The spectral range encompassed wavelengths spanning from 302 to 1148 nanometers (nm).Notably, the optical bandwidth and perspective angle employed in this spectral data acquisition process were set at 2 nm and 12 degrees, respectively.Each sample was scanned three times.To avoid exposure differences, the spectroradiometric measurements of the different samples were performed in full sunlight for short periods of time.Spectral reflectance matching of different samples was done using calibration factors derived from a white reference standard.In the course of spectral measurements, it was imperative to employ a black sheet positioned beneath the fruit specimen.This strategic placement served the crucial purpose of mitigating spectral reflections emanating from the surrounding background, thereby ensuring that the recorded spectral data primarily represented the reflective characteristics of the fruit itself.From the readings of the spectrometer unit, the reflectance of the fruit is calculated and corrected using calibration elements taken from the reference gray standard.Spectroradiometric measurements were taken from the vertical position approximately 30 cm above the fruit on clear days.
Table 1.Index abbreviations, formulae, and references of selected RGB indices for digital image analysis.

Index Abbreviation Formulae Reference
Nomalized red (Rn) G/(R + G + B) [27] Nomalized blue (Bn) B/(R + G + B) [27] Kawashima index (IKAW) (R−B)/(R+B) [9] Excess red vegetation index (EXR) 1.4×rn−gn [28] Excess green minus excess red index (ExGR) ExG-ExR [29] Color index of vegetation (CIVE) 0.441×R-0.881×G+0.385×B+18.78745This study Green red ratio index (GRRI) G/R [30] Green-red vegetation index (GRVI) (G−R)/(G+R) This study Normalized difference index (NDI) (rn−gn)/(rn+gn+0.01)[12] Visible Atmospheric Resistant Index VARI Normalized Difference Vegetation Index 1 (NDVI1) Normalized Difference Vegetation Index (NDVI) Color intensity index (INT) (R+G+B)/3 [34] https://doi.org/10.1371/journal.pone.0308826.t001 2.6.Machine learning models 2.6.1.Random Forest (RF).Random forest (RF) is a versatile technique grounded in regression trees or multiple classifications, adept at assessing the interplay among a number of variables that are independent and dependent variables.It achieves this by partitioning the dataset into various nodes, forming homogeneous subsets known as regression trees (ntree) through recursive partitioning, and subsequently aggregating the outcomes from all these trees.In its growth phase, each tree is expanded to its greatest extent, drawing upon a bootstrap sample from the dataset used for training, and notably, it introduces an element of randomness during the regression step within each tree.This randomness is introduced by selecting a random subset of variables (mtry) to estimate the node split at each juncture [18].The training process of RF takes into consideration three critical factors: ntree and mtry.Specifically, ntree is the number of trained features, ranging from 1 to 20, while mtry corresponds to the random subset of features selected for node splitting.To optimize the model and minimize the root mean squared error (RMSE) of validation (RMSEV), the leave-one-out validation method (LOOV) is employed for fine-tuning the two parameters, mtry and ntree.The parameter ntree undergoes scrutiny in the range of 1 to 25, and the optimal value for mtry is assessed by varying the number of features used.Once the model is trained with the optimal parameters (Fig Anthocyanin index (NAI) (R 760 -R 720 ) /(R 760 + R 720 ) [37] Greenness index (GI) R 554 /R 677 [38] Pigment-sensitive ripening monitoring index (PRMI) (R 750 -R 678 )/R 550 [11] https://doi.org/10.1371/journal.pone.0308826.t002 5), all the features are organized, and a selection of the most valuable features is made based on variable importance statistics [39].Throughout this iterative process, the outputs are diligently collected, and multiple combinations of features are assessed to identify the optimal feature set that yields the lowest RMSEV.

Decision Tree (DT).
The process known as decision tree induction is the method by which decision trees are generated from sets of training data that have been annotated with class labels.These decision trees take on the form of graphical structures resembling flowcharts, consisting of distinct types of nodes, including a root node, decision nodes, and leaf nodes.The root node marks the starting point of the tree, while the decision nodes play a pivotal role in making choices and guiding the progression from one node to another.Ultimately, the leaf nodes represent the ultimate outcomes or classifications determined by the decision tree.It's important to note that not all decision tree algorithms produce the same types of trees.Some, such as the CART (Classification and Regression Trees) algorithm, are constrained to produce binary trees, which are characterized by having precisely two internal nodes, while others possess the capacity to generate non-binary trees [40].During the training phase of decision tree induction, three critical factors are carefully considered: the maximum depth of the tree (Md), the minimum number of samples allowed per leaf (Ms), and the maximum number of leaf nodes (Mln).Specific parameter values have been selected for each of these factors, including Md values of 1, 3, 5, 7, and 9, Ms values of 2, 4, 6, 8, and 10, and Mln  strategic approach served the dual purpose of mitigating overfitting tendencies and enhancing the model's predictive capabilities [41].The entire process, spanning from data analysis to model creation and data preparation, was executed using Python 3.7.3.For the regression tasks, the RF and DT modules from the Scikit-learn package version 0.20.2 were employed.Notably, data examination was performed on a machine equipped with an Intel Core i7-3630QM processor clocked at 2.4 GHz, complemented by 8 GB of RAM.
2.6.4.Model assessment.To evaluate the efficacy of a regression model, two distinct statistical metrics were employed including the coefficient of determination (R 2 ) and the root mean square error (RMSE) according to Eqs 5 & 6 [42,43].The parameters applied in this assessment are explicitly defined as follows: "Y act " signifies the genuine laboratory-derived value, "Y p " represents the anticipated or simulated value, "Y ave " denotes the mean value, and "T" encompasses the entirety of data points.
Root mean square error RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi

Variation of different biochemical parameters, RGB and spectral reflectance indices of mandarin and tomato fruits
Ripening degrees have a significant impact on the biochemical characteristics of mandarin and tomato.The Chl a, Chl b, TA, and firmness values of fruits decreased during the ripening stages.While TSS, TSS/TA, Car, and Lycopene increased with increasing the fruit's ripening.Significant difference between the mean values of each biochemical parameter at different ripening degrees was found.As well as RGB indices and SRIs values were changed according to the ripening stages in Tables 3 & 4. Chl a values of mandarin ranged from 0.03 to 4.24 (mg/g tissue), and Car from 0.07 to 0.23 mg/g FW, while t RGB indices such as VARI ranged from -0.65 to 0.34, and IKAW from 0.00 to 0.97 (Table 3).In addition, the SRI such as R 710 /R 600 values ranged from 1.26 to 3.91, and PRMI ranged from 0.28 to 5.78.For tomato, TSS values ranged from 4.00 to 6.80 (Brix%), and firmness from 690.00 to 350.00 kg f /cm 2 as listed in Table 4.While RGB indices such as INT ranged from 68.66 to 176.31, and ExR from 0.04 to 1.26 in Table 4.In addition, SRI such as, R 546 /R 1132 values rising from 0.75 to 1.02, and NAI ranged from -0.02 to 0.04.Shravan et al. [44] reported that throughout fruit maturation, the concentration of chlorophyll decreased significantly, while the concentration of carotenoids and TSS increased.Thus the color shift that occurs during the ripening phase of fruit from green to red is due to the discovery of pre-existing pigments due to the decomposition of chlorophyll [45].Brandt et al. [46] explained that at the beginning of the growth stage, the amount of lycopene decreases and then increases during the fruit ripening stage.Thimann [47] discovered that the chromoplasts also contain yellow or red carotenoids, such as lycopene, so we observe the breakdown or disappearance of chlorophyll with the accumulation of lycopene (a red pigment), altering the color of tomato fruit into the red.

Correlation analysis between all biochemical parameters of mandarin and tomato fruits
Correlation analysis has elucidated a significant association among the following variables: Chl a, Chl b, TSS, TA, TSS/TA, carotenoids (car), lycopene, and fruit firmness, as meticulously outlined in Tables 5 and 6.For mandarin fruits the greatest correlation coefficient (r) was found between Chl a, and Chl b (r = 1.00).The lowest r value was found between the TSS and TA (r = -0.75).All biochemical parameters have statistically substantial associations ranged from -0.75 to 1.00 in Table 5.
For tomato fruits, the highest r value for all biochemical parameters was between firmness and TA (r = 0.94).While the lowest correlation coefficient of-0.45 was found between Chl a and TSS/TA.All biochemical parameters of tomato have substantial significant associations ranged from -0.45 to -0.94.These results are in agreement with Elsayed et al. [9], who found that the TSS, car, Chl a, Chl b, total Chl and TA of mango were shown to be substantially linked.Galal et al. [14] discovered considerable negative relationships between TSS and Chl content of banana and orange fruit at various maturation stages.

Relationships between RGB indices and biochemical parameters of mandarin and tomato fruits
Twelve RGB indices acquired through digital images processing were linked to different biochemical parameters of mandarin and tomato fruits as seen in Tables 7 & 8.The findings demonstrated moderate to significant correlations between all biochemical parameters and the For tomato fruits, the highest R 2 for the relationship between Chl a and GRRI, IKAW, and NDVI1 had the greatest R 2 of 0.66.The highest R 2 let were Chl b, with GRRI was (R 2 = 0.59).Also, the NDVI1 produced the highest coefficient of determination (0.90) with T. acidity.The results further demonstrated that the Rn had the greatest R 2 (0.75) with TSS, whereas ExG and Gn had the lowest R 2 (R 2 = 0.55).The Rn and GRRI led to the greatest R 2 for estimating firmness (R 2 = 0.85; for both).
In this research study, RGB indices have proven to be useful and can be used at different ripening stages of mandarin fruits.Other studies used the surface color of the fruit as one of the main factors that have been used to determine the ripeness of mandarin fruits.For The DT-12RGBI model was identified as the most reliable predictor for the ratio of TSS to TA, demonstrating robust R 2 values of 0.99 for the training dataset and 0.98 for the validation dataset.The RMSE values were calculated as 0.408 for the training dataset and 0.508 for the validation dataset.These findings underscore the effectiveness of these models in predicting various biochemical parameters, thereby highlighting their potential utility in plant and soil ecology research.
The major variables of tomato fruits were isolated through the parameters researched, as depicted in Table 12, leveraging both DT and RF models.These particular features were  instrumental in pinpointing Chl a, Chl b, TSS, TA, TSS/TA, Hardness, Carotene, and Lycopene.The efficacy and precision (accuracy and RMSE) of DT and RF models in predicting the considered parameters are represented in Table 12.The study reveals that the model DT-3 SRIs maintained the supreme correlation with Chl a and distinctive features, making it the The SRIs and RGBIs were processed using two ML models, the DT and RF, with the goal of selecting optimal features in each iteration.Subsequently, the most effective hybrid variables (HV) were identified to estimate the quality of different fruits, including mandarins (Table 13) and tomatoes (Table 14).First, regarding mandarin fruits: In the task of predicting Chl a, the DT-2HV model delivered exceptional results, registering an R 2 of 0.993 with an RMSE of 0.149 for the training set, and an R 2 of 0.991 with an RMSE of 0.114 for the validation set.For the prediction of Chl b, the RF-23HV model demonstrated excellent accuracy, with R 2 values of 0.996 and 0.989 and RMSE values of 0.146 and 0.166 for the training and verification sets, respectively.Exceptional accuracy was shown by the RF-14HV model in predicting TSS, as Table 13.Displays the results from the Random Forest (RF) and Decision Tree (DT) models, using the superior hybrid features of both spectral reflectance indices (SRIs) and RGB indices (RGBIs) together to predict various characteristics of mandarin fruits.These characteristics include chlorophyll b (Chl b), chlorophyll a (Chl a), total soluble solids (TSS), titratable acidity (TA), the TSS/TA ratio, and carotenoids (car).Second, for tomato fruits: the DT-5HV model demonstrated exemplary performance in the Chl a prediction, achieving an R 2 of 0.905 and an RMSE of 0.077 for the training dataset, and an R 2 of 0.785 with an RMSE of 0.077 for the validation dataset.For Chl b assessments, the DT-4HV model proved exceptionally precise, attaining R 2 records of 0.779 and 0.574 and RMSE measurements of 0.119 and 0.116 for the training and validation datasets, respectively.Precision was evident in the DT-27HV model's TSS predictions, which showed an R 2 of 0.960 (RMSE = 0.118) for training and 0.793 (RMSE = 0.207) for validation, respectively.The DT-20HV model's performance in TA prediction was notably strong, with an R 2 of 0.982 and RMSE of 0.007 during training, and an R 2 of 0.896 and RMSE of 0.010 during validation.The  [8,53], which indicate that selecting high-level variables plays a crucial role in enhancing model optimization and improving the accuracy of the predicted outcomes.

Variable Model Advanced characteristics of integrating
In the future, leveraging advanced ML techniques with optimally combined features holds significant promise for further enhancing the accuracy and efficiency of fruit quality assessment.Continued refinement of these combined features and the exploration of novel algorithms can lead to more robust predictive models, applicable to a broader range of fruits and agricultural products.Additionally, integrating larger datasets and incorporating real-time data processing capabilities will facilitate more dynamic and precise quality monitoring systems, ultimately benefiting producers and consumers alike through improved product consistency and reduced waste.

Conclusions
A cost-effective approach to appraise the fruit quality parameters of mandarin and tomato at different levels of ripeness was developed using RGB and SRI indices in combination with the DT and RF models.In the context of R 2 values, the RGBI and newly created SRIs outperformed to assess the fruit quality parameters of mandarin and tomato.All tested RGB and SRIs examined had significant association with the biochemical parameters of mandarin and tomato.There are statistically significant associations between all assessed SRIs derived from the VIS and NIR regions and fruit attributes.Combining RGB and SRIs indices with DT and RF models would be a robust strategy for estimating eight observed variables associated with reasonable accuracy.The findings of this research study would be adequate to provide a potential reference for estimating several parameters.This research also offers technological assistance for monitoring and assessing the fruit quality of mandarin and tomato during ripening and storage.In conclusion, this information, which is based upon the most reliable RGB, SRI, and DT and RF calibration models developed in the current study, has the potential to be used to build active measurement systems in order to track fruit quality and ripeness in the field or factory.

Table 6 . Correlation coefficients between eight parameters, chlorophyll a (Chl a), chlorophyll b (Chl b), total soluble solids (TSS), titratable acidity (TA), TSS/TA, carotenoids (Car), lycopene, and firmness of tomato fruits.
For mandarin fruits, IKAW and NDVI1 presented the greatest R 2 of 0.91 with Chl a. VARI and Rn presented the highest R 2 of 0.96 with car.The greatest R 2 of Rn with TSS was (R 2 = 0.81).

Table 8 . Coefficient of determination values of linear regression models of eight tomato fruit attributes, chlorophyll b (Chl b), chlorophyll a (Chl a), total soluble solids (TSS), titratable acidity (TA), TSS/TA, carotenoids (car), lycopene and firmness with twelve RGB indices.
7RGBI model outperformed other models as the optimal predictor, demonstrating impressive R 2 scores of 0.99 for both the training and validation datasets.The RMSE values for this model were 0.193 and 0.134 for the training and validation datasets, respectively.Turning our attention to TSS, the RF-7RGBI model emerged as the most precise model for TSS prediction, exhibiting R 2 values of 0.96 and 0.83 for the training and validation datasets, respectively.

16.251 0.898 23.199 https
://doi.org/10.1371/journal.pone.0308826.t014RF-7HV model attained moderate accuracy in estimating TSS/TA, with an R 2 of 0.5515 and RMSE = 0.176 during training, and an R 2 of 0.364 and RMSE = 0.218 in evaluation.The DT-2HV model's forecasting of the Car was robust, recording R 2 scores of 0.891 and 0.873 and RMSEs of 0.013 and 0.011 for the training and validation phases, respectively.The DT-4HV model outperformed in lycopene prediction, registering an R 2 of 0.978 and an RMSE of 2.969 in training, followed by an R 2 of 0.919 and an RMSE of 3.618 in validation.In firmness calculation, the DT-12HV model achieved exceptionally, securing an R 2 of 0.973 with an RMSE of 16.251 during the training phase, and capturing an R 2 of 0.898 with an RMSE of 23.199 in the validation phase.These findings are consistent with previous studies